مدل دو مرحله ای شکاف- گلچین برای نمایه سازی خودکار متون فارسی

نویسنده

توکلی زاده راوری, محمد دانشگاه یزد

چکیده مقاله:

Purpose: Each language has its own problems. This leads to consider appropriate models for automatic indexing of every language. These models should concern the exhaustificity and specificity of indexing. This paper aims at introduction and evaluation of a model which is suited for Persian automatic indexing. This model suggests to break the text into the particles of candidate terms and to cull the most appropriate ones through a special method of term weighting. Methodology: The introduction method of the automatic indexing model is performed through showing the steps and the possible problems for running them. Evaluation is based on the inclusion index. This index is used for determination the inter-indexer consistency. Therefore, the consistency of resulted index terms (from this model) and author keywords is determined. Findings: Findings show that 90% of articles' most weighted terms are similar to their first author keywords. The overall consistency between the results of running the model and author keywords is 76%. Compared with the prior works, the performance of the model is acceptable. Originality/Value: The initial value of this paper is concerning the automatic indexing with regard of Persian language problems. The model is well suited for using regular expression language which is supported by many programming languages. This diminishes the need to create database tables for text manipulation and processing. In addition, the model solves the problem of upper threshold for determination of final terms. Another algorithm makes it possible to determine the lower one. Finally, the number of culled terms does not depend on the text length. This guaranties the exhaustificity and specificity of indexing.

Download for Free

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

شناسایی واژه‌های غیرمفهومی (رایج) در نمایه سازی خودکار مدارک فارسی

پژوهش حاضر با هدف شناسایی واژه‌های غیرمفهومی در زبان فارسی و تهـیه سیاهه‌ای از این واژه‌ها برای نمایه‌سازی خودکار متنهای فارسی در رشته‌های روانشناسی، علوم‌تربیتی و کتابداری و اطلاع‌رسانی انجام شده است. این پژوهش با روش تحلیل محتوا صورت گرفته‌است. جامعه آماری این پژوهش را مقاله‌های مندرج در آخرین شماره منتشر‌شده در مجله‌های علمی و پژوهشی ِ رشته‌های علـوم‌تربیتی، روانشناسی و کتابداری و اطل...

متن کامل

تشخیص خودکار جنسیت نویسنده در متون فارسی

Gigantic amount of textual data being transfers in web everyday. like other communities,cyberspace is vulnerable to attacks, false information and deception.it becomes increasingly important to design an efficient method to trace identity in this community.to investigate the problem of gender identification,we propose 48 feature,and design three machine learning algorithms.the results of study ...

متن کامل

نمایه سازی خودکار(گذشته،حال،آینده)

متن کامل

ارائه یک مدل قیمت گذاری مناقصه ای برای مناقصات دو مرحله ای برمبنای رویکرد ترکیبی

این مطالعه به منظور کمک به مناقصه گران برای حضور موثر و موفق در مناقصات دو مرحله ای، یک مدلقیمت گذاری مناقصه ای رقابتی برای تعیین قیمت در این نوع مناقصات با استفاده از یک رویکرد ترکیبی ارائهمی دهد. در این راستا ابتدا تابع احتمال برد برمبنای میزان ترجیح مناقصه گزار استخراج می شود که با استفاده ازآن می توان براساس سطح کیفیت برآورد شده، احتمال برد مناقصه را در قیمت های مختلف برآورد نمود. سپس بابرآ...

متن کامل

بهبود خلاصه سازی خودکار متون فارسی با استفاده از روش‌های پردازش زبان طبیعی و گراف شباهت

A significant amount of available information is stored in textual databases which contains a large collection of documents from different sources (such as news, articles, books, emails and web pages). The increasing visibility and importance of this class of information motivates us to work on having better automatic evaluation tools for textual resources. The automatic summarization of tex...

متن کامل

تکنیک‌های خلاصه‌سازی چندسندی خودکار متون فارسی مبتنی بر الگوریتم‌های فرااکتشافی

هدف:ارائه الگوی خلاصه‌سازی استاندارد متون فارسی با رویکرد تبدیل مسئله خلاصه‌سازی به مسئله بهینه‌سازی توسط الگوریتم‌های فرااکتشافی سازگار. روش‌شناسی: در این پژوهش از اسناد استاندارد پیکره چندسندی «پاسخ» که شامل 50 موضوع مختلف از انواع گونه‌های خبری از خبرگزاری‌های پرببینده ایران، برای ارزیابی استفاده شده است. هر موضوع حاوی 20 سند و همچنین 5 خلاصه چکیده‌ای ...

متن کامل

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

عنوان ژورنال

تحقیقات اطلاع رسانی کتابخانه های عمومی

دوره 21 شماره 1

صفحات 13- 40

تاریخ انتشار 2015-06

دنبال کردن

لغو دنبال کردن

{@ msg @}

با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.

کلمات کلیدی

کلمات کلیدی برای این مقاله ارائه نشده است

میزبانی شده توسط پلتفرم ابری doprax.com